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To b«ttef understand ibe virolopc*! . jpcd of the expandtnn AIDS epidemic lb fouUiem Afric», » set of 23 
near-fuH-lengih clone* ofhunuti ImmuaodeEclency virus iypc I (Hf\'l) repcvuotloe ctgbt AIDS palleuU fh>ni 
Botawaiw were sequenced and aiuUrzed phylog*DeUcaUy. All jtudy nniMi from Botswana bcloneed to HIV-l 
subtype C. The interpalieDl diversity of the done* from HoUwana was hicber than atDOng fuU-lcnetb Uolates 
of subtype B or among a (ct of fuIMcixgth HIV-l fieDOfnes of subtype C from Indta (meao value of 9.1% wsyi 
6.5 and 4.3%, rtspectively; P < 0.0001 tor both comparisons). SlulUr rtiulu were observed in all geaes acrou 
the cntlr« rlral ^eDOtne. We suggest that the htgh level of HIV 1 diversity might be a typical feature of the sub- 
type C epidemic In sonlhem Africa. The reason or rcasoatc for this diversity »r« unclear, bwt may include an 
altered replication efficiency of HIV-l subtype C and/or the mulliple iDtroductiou of different sabtypc C viruseK. 



The majority of new humnn immunodeficiency vJnis (HIV) 
infections in lh« global AIDS epidemic are appearing in sub- 
Sahiir»n Africa and Southeast Asia- Complied with the situa- 
lion a decade ago. the main AIDS epidcmici have fchifUd from 
cenual and easiern Africa to the southern regions. The most 
severe HIV epidemics have recently afflicted «uch southern 
African counlrics as Zimbabwe, Zan)bia, Namibia, South Af- 
rica, and Doiswanft (43), HIV-l subtype C has been estimated 
to account for 4S% of HIV-l infections worldwide and 5\S% 
of HIV-l infections in Africa (4, 7, 14_16, 21, 31), where the 
main mode of iransmission a heterosexual (43, 44, 47). 

A rapid enpansion of the HTV-1 epidemic in Botswana has 
occurred since the early to mid 1990s. According to the 
UN AIDS and World Health Organization (WHO) Global 
HIV/AIDS A STD Suiveillaoce data, HTV prevalence among 
antenatal clinic attendees tested in the major urt>An areas of 
Botswana (Gaborone, Francistown, and Selebj-Phikwc) in- 
creased from 6% in 1990 to 39% in 1997 (range of 34 to 43%) 
(42). Among womeo 20 to 29 years of age, 43 to 44% tested 
HIV positive. Outiidc of the major urban area*, median HTV 
prevalence increased from no evidence of infection in 1985 to 
1987 to 34% in 1997. In 1997, HIV prevalence in Botswana 
ranged from 28 to 38%. As such, locally circuiaUng HIV-l 
needs to be characterized thoroughly, and vital informaiion 
about the nature of the epidemic should be extended (2. 4, 7, 
21, 31, 37, 45-.47). Moreover, Bolswana's central geographic 
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position malces a comprehensive HTV-l molecular epidem' 
logical stud)' that much more urgent, because it may serve . 
example of the burgeoning epidemic In southern Africo- 

In this study, we report the molecular cJon:..^ ^. yto^.- 
neiic analysis of 23 rtear-full-kngth clonea from Botswana. All 
of iJiein were Identified as belonging to HTV-l subtype C and 
demonstrated high levek of intersample diveisliy acros< the 
entire viral genome. By providling new genetic inform aiion 
regarding locally circulating \-inues, this study may contribute 
to AIDS vaccine design for the southern Africa region coun- 
tries Mnd, in particular, for Botnrana. 

Specimen:! for this study were selected from HI V-teropOfii- 
tivc patients in Gaborone, Botswana. Ail HlV-l infections in 
this study were likely to be hetero&exually ac<juir<d. The limes 
of infection were not knowa The HlV-l-seropostiKe status of 
patients was confirmed by enzyme-Unked immunofiorbcnl as- 
aay and Western blot analysb. Clinical classificalion was per- 
formed by using the 1987 Centers for Disease Control and 
PrevcnUori (CDC) revised classification (9) (data not shown). 

Genomic DNa was obtained directly from the palienU* pe- 
ripheral blood mononuclear cells (PBMCs) — bulfy coals— 
without pa-vagc through cell culture or donor PBMOs. Alt 
clone.^ in this study were amplified in hemincslcd PGR with 
three primers from the LA set (18) or Ihcir modificaiions. The 
Expand Long Template PGR system (Boehringcr Mannheim. 
Indianapolis, Ind.) was used according to the manufacturer's 
Ift^lructions, Gel purificMlion of the first-round PCR product 
was essential for direci implification of 9.0-kb fragments from 
uncultured PBMCS. Estimation of the eiqjanded PCR sensitiv- 
ity (based on 8E5/LAV) revealed a successful amplification of 
the 9.0.kb fragment in the first round when at least S x 10^ to 
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FIG. 1. Phylogcneiic rcUiioiuliq} of ihc »c*ly ch»i»ci erired full-length dofwi from Bois*«w (b<Kod inbUck) lo oiUar reprucouiivc fuO leqgtb HTV*] (MAkenou 
of w>!«ypc* A, B, C, D, F, and H mnd rooombirwit lubtypcs AE, AO, myi DP- FulMertjih C «equenoc« fiom tndu woe «l>o bKludtti in Ow wufytk. A 

neighbor-] oitiin* tree wia const roaed on the b»*u o( the hiddoi MMiko* model nu<J«oijdo ulipuneni of fuD-la)^ HW-l lenofntv. S\Atyp« O ANT70 «e<yieoce wm 
used S3 ui outgrcMjp. VaIucj ilori; the bnmdKK indicaK th« boouiny vaiues ih*t lupport brMching (ovt of « 1,000 rcsmiiliag)' 



4x10' proviral copies were present in the reaction (claia nol 
sho^). These results were consisteni wiih those from other 
studies (10, 39). The TA pCR2 1 TOPO system (Inviirogen, 
Carlsbad, Calif.) and JM109 competent cells (Promega Cor- 
poration, Madison, Wia.) wcrc o$ed for cloning. Positive col- 
onics were screened by PCR. To obtain suflficicni plasmids 
for sequence ainalysis, we amplified the constructs under the 
previously describ*MJ conditions with &omc modificftUoftS (41), 
Purified plasmid DNA served m a template for secjuenclng. 
Both-5tf and sequencing was combined with a strategy involving; 
overlapping sequences. Dye terminator sequencing on An au- 
lomRied DNA Sequenaior (model 373a; Applied Bio.sysieins, 
bic, Foster Oty, Calif.) was used. 

A multiple alignment procedure fox the fuU-lcngth HIV ge- 
nome was pefformcd by u«i«g the hiddeu Markov model. 
Cxwxtructed through the HIV-1 HMMER computer program 
of the Los Alamos National Laboratory, the model has been 
previously shown to provide the bcsl description of the live 
nucleotide substjttition pattern of HTV-l gag and env genes 
(26). l-he HIV-l HMMER model (H, 12) construaed at Los 
Alamos Nalional Labor Atory for the full-length HTV-l ge- 



nomes (24) was employed. Sixty fuU-Icngth reference se- 
quences were included in the alignment from the GenBank 
data bank (5). The 3' end of die alignment, which included the 
coding region and 3' long tcrminid repeat (LTR), w»s ad- 
justed mai^uatly. The pairwise trvolutionary distances from nu- 
cleotide sequences were computed by the DNaDIST program 
under Kimura's two-par atneier model (17), All alignments 
were globally gap stripped for the generation of the trees. The 
tranaltionAransversion ratio panuneteis were set at 3.0 for the 
gag gene, li for the <w gene, 1.42 for Iho VI -V2 and V3 
fragments, and 2.0 for the other viral loci (25). A tree was 
drawn by the Njploi (33) and Tree View (32) programs. To 
analyze pattcms of variability along the HTV-1 genomes, the 
program SWaN, which utilizes a "sliding window" approach 
was uaed (34). Positions with gaps either were of were not ex- 
cluded from the analysis- Tlw variability distribution was es- 
timated as an entropy function of the nucleotide variation 
observed at a particular position. The Recombinant Identifi- 
cation Program (RIP) (40) and HIV-l Subiyping Basic BLAST 
(3) were used in searching for recombination among the clones 
studied- 
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Scqnen« analysis of the Botswana HIV-I revealed thai 10 
of 23 cione^ had an intacl genomic organization with open 
reading frames. The olher clones had point mutations and/or 
in5«rtions and delctiom resulting in frameshifts, disabled surt 
codoos, or premamre stop codons. No major dele lion* or rc- 
arraogcmenis were observed. Detennined length polymor- 
phbm among studied sequences was limited to the vpu (15- to 
18-nucIeoiidc [nt] tnscnton at the 5' end), <w (froro 3- lo 9-ni 
deletions to 4S-nt inser lions, GIGRGQ motif in the BW]7 V3 
loop), 2nd exon of^v (h 13- amino -acid truncation at the 3' 
end), rte/(6- to 15-nt deletions in some clones), and regulatory 
regions of the LTR (three or four NF-kB sites with GGGAC 
rrrCT as a potential fourth NF-kB in two clones of BW05)- 

An evolutionary tree in Mg. I shows the phylogeneiic rela- 
tionship of the fuU-icDgth Bocswzna clones to other rcprc^cn- 
taiive full-length HTV-l sequences. All Botswana sequences, a 
ftet from India (27), and two subtype C reference sequences 
(C-ETH2220 [35] and C-92BR02S (191) clustered together, 
forming a compelling subtype C outcropping on the ph)4o ge- 
netic tree. This cluster was separated from other hW-I se- 
quenoes by the cwrcmcly high bootstrap value of 1,000 (out of 
1,000 resampting). Phylogenetic relationships within the sub- 
type C bush were also noteworthy. Assuming that the circled 
node at ihc center of the bush could represent the potential 
ancestral subt>pc C node, we observed the following, (i) The 
star-like phylogcny of the subtype C bush together with its 
branching order may demonstrate llie relatively high diversity 
of ihe Botswana samples, (ii) Four Botswana sample cbnes 
(BW03, BW05. BWi5, and BW16), together witji all five se- 
quences from India, formwl a potential sobcluster, although 
the bootstrap v^luc was not high, (iii) All Indian samples were 
separated by bootstrap vslues of 1,000, po&sibly reflecting a 
"founder effect" among these Mquences. (iv) Three Botswana 
sample clones (BW04, BWU, and BW12) may represent indi- 
vidualized groups of sequences inherited from a common sub- 



type C ancestor, (v) Two reference sequences ETH2220 and 
92BR0Z^ deviated together wiih a high bootstrap value (1,000), 
possibly reflecting another subtype C subchistcr differing slg- 
ntfkantly from Botswana or Indian samples, (vj) One of the 
Botswana samples (BW17) sixayed rather far oflf the main 
subtype C bush and may be the least representative of Bots- 
wana HTV-l samples, (vii) The topology of the Botswana 
doncs confirms that dooes of the same samples are closely 
related to each other based on fuU-lcngth genome itcqucncts- 
A mullilocus analysis was coflgnient with fuH-gcnome phylo- 
gonctie analysis and confirmed clustering of newly derived Bots- 
wana clones within vubtype C across the entire HTV-l genome 
(data r>ot shown). 

To characterize the level of variability among Botswana 
clones across the entire HIV-1 genome, we performed SWaN 
program analysis as an entropy ftuiction of the nucleotide 
variation. The Botawana set bad greater variability than sub- 
type B samples (Fig. 2) (mean values of 11.6 and 8^%, resj>ec- 
tjvely, for gap-stripped analyais). The profilee of viral variability 
across the WV-l genome were simlUr among subtype B and C 
viruses. Comparison of gap-stripped and gap-nonsiripped plots 
reveakd that the differences in mean values between tJie two 
methods of computing and the shape of variability plot profiles 
were not significant (data not abown). Gap stripping slightly de- 
creased the mean value and the number of sliding window site* 
ao^oss the genome. It also hid the extreme regions with the 
highest k"iel of variability. Both variability when measured as 
an entropy function in this study and when described before 
diversity as a pairwiae oomparison of the sequence (19) exhib- 
ited similar profiles of variable and cooserrative genomic re- 
gions Variability plots (especially non-gap sulpped) revealed 
higher peaks in variable regions than diversity plots. 

Table 1 depicts the hi^ degree of intcrsample diversity 
acr06S the entire HTV-l genome among Botswaoa clones com- 
pared with subrypc B and C se^ence^ from India (27). Be- 
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TABLE 1. Intcnampli: HIV-l divcrsiiy In ihij study" 
Q«ne or Subtype D £ub(yp« C, 





Subtype C, 


(8 i*<iuencet) 


23 »oht«c 






7.9(y.!»-9.2) 


5.6' (3.6-9,6) 


4.9*(l.(Wl.(;) 


3.*' (14-4.6) 


p.} 


5.9 (4.3^7.8) 


4.1'(i«>^.4) 


3,9*(0.«-(>.2) 


2.6* (1^3.7) 


vf 


7Jt (4.^-1 2.S) 


dJt (3.7-9.J) 


6.4t {0-P.5) 


3,2' (l-*-^-^) 


vpr 


9 (5. VI 4.9) 


6^* (4 *-e.4) 


5.9* (1.7-10.8; 


4.6* (2,1-4.1) 


tat 


9.0(6,9-149) 


7^* (4.6-10.7) 


rr (3.*-I0-7) 


3 7» (2.7-5.1) 




10.4(7 7-16.6) 


9.4t (4.2-14.2) 


e.2*(^.3-I4.2) 


4.7* (3.fl-J 7) 




13.9(10.1-18,6) 


10,6' (6.1-14.3) 


9.5*(4.J-154) 


5,r (2.1-8.1) 


VI .V2 


12.4(10.4-143) 

15.7 (35.9-36.7) 
14.4(115-18.9) 


9 J' (6.2-11.9) 
t8.0" (8.6-26.1) 
11.6' (6.7-17.3) 


9.5*(!t.M17) 
1<J.0* (8.6-V).6) 
12.1*(6-0-)S-:) 


6.9* (5.1-9.?) 
25.8(14.2-34.8) 
6.4* (5.1-7.9) 




11J(*.0-I5.1) 


9.«| (6-3-15.7) 


9.2» (3.5-1 6 J) 


5 J* (4.4-6.4) 


V I.TR 


9.7 (6.6-13.4) 


(5.1-11.(5) 


7.3* (1.0-n.O) 


4.0- (3.2-5.4) 




9.1 (7.7-10.7) 


6.6' (4i-4.0) 


6.5' (J.5-9.6) 


4 J' (3.2-5.7) 



* Fi^-m foU-lenjOi HIV.J to^onoa wuc ucd in ihe uulyHK, bojed on the 
hidden Mvfc<7v BKXlel iltcnmcAi oi^<. cMirc HCV.l genome Tbe foUo»vij 26 
Kqucnoct of ua>tyi>e B woi« uicd: AUMBCCS4 ClXMBC, 1>H123, d9.6, RF, 
IVEaU, HAN, HfVMN. BCSG3, OYI, CaMI NY5. pNl>l3, I.Al, HXB2, 
JRFL. IRCSF. AUMBC200, YU2. YUlO, aCTOJAA, ACH320B, 

SF2. HlVl ADS, D31, MANC, ind WR27. Sc^jkhqcs aUMBCC54. CIBMBC, 
«nd NY5 wcrt cBdudcd from the jukJ 3' LTR aiujysis bcoujc of deltrioni 
or tije »kscncc of icquenoct for the»« reriDiu P»if*i»« dl«UiK«« in four froup* 
of •OMienea (pNU3, LAV, md HXBl; IRFL lod JRCSF; YU2 wid YUlO; uhJ 
ACH320A uid ACH320B) were cxdw3«d (rom the wuOywi. The ejghl lubtype D 
leqoenw from aTOS piiicnii wctc JRFL. YUl 89.6. RF. HAK. MN, B<3G3, 
and WR27. Thinocn *cqucooci of lubtype C wen indudcd to (h£ AHAlyifa: S 
dotMl \t<m BouwMu (1 firom t*A>, paUem) and 5 tAqu^ncos (rnm Indu (301999, 
2106A. 301905, 301904, uid 11246). All d)9Uoec<( were alculktcd I7 DNADlSt 
pirtgrun froDt ibe PHTYLIP v. 3i72 p*cV«^c bMcd on h)d<kn Kuikov tncxici 
utipuneLnL TKc u-uuilMnAraiu^ncnioo rtttot w«t« *«t 10 3.0 for jt'g, 1 J for ot«, 
) .42 for V3 ind V1-V2, ind 2.0 for all cMhsr HIV-l |[cocfl. 

' SuiiatusI tignifictnce vcrws fioicwttax HTV-1 donci ii ihown m foUowi; *, 
F < 0.0001; #. P = OjOOS. j. P = 0-OJ3. X. P " 0.02S; t. P = aoSJ; md }, P = 0.13- 



Cftuse AIDS patienis might be cx|p<taed to have higher vari- 
ability, we made the SAme comp^ison, limiting the subtype 6 
reference 10 eight sequences selected from confirmed AIDS 
patienis (column 2). The inlersampic diversity among Bots- 
wana clones was signifkanlly higher than that among subtype B 
references or Indian samples on the level of the fulUlcngth 
HTV-l genome. Across ih« viral genome, the mean diversity 
among Botswana samples was congruent v^iih the fult-length 
genome analysis. The intersatople diversity analysis siaiisticaUy 
confirmed the phylogenetic study obsexvaUons (Fig. 1 and 2) 
that the newfy characterized Botswana clone5i were highly di- 
versified 

The results of iotrasample diversity analysis were Cm) ted by 
the methods used (FCR amplification and cloning) and by the 
available number of multiple subtype B full-length clones. Sev- 
en Botswana samples (except BW12) and three siibtyite B sets 
(JRFI., YU. and ACH320) were compared across the HTV-l 
genome. The range of ftitl-lengih diversity among Botswana 
samples was 03% (BWl7 clones) lo Z9% CBW04 dooesV with 
an average mean value of \A%. Intrasample divBrsi^ showed 
no significant difference between subtype B and C sequences 
(Fig. 3). However, two concentrations of diversity (low and 
high) were revealed among both subtype B and C sets (Fig. 3). 
These concentrations of low iwid high diversity were distribu- 



ted across ihe entire genome and were found to be more con- 
sistent in the structural genes {gag^pol, and env). 

All Bol5wana sequences were checked for potential recom- 
bination silcs by the HIV-l Subtyping Basic BLAST (3) and by 
RJP (40), the results corwistently showing no evidence of re- 
oombination. 

Clustering with HIV-l subtype C and the high intersample 
diversity were the most exceptional attributes of the 23-clonc 
set from Botswana. A siar-JiVc shape of the subtype C cluster 
in the phylogenetic tree was accompanied by extremely high 
bootstrap values across tree branches. The topology of the 
phylogenetic tree suggeifted thai-a common Ancestor for the 
Botswana sequences might have existed before the common 
ancestor for the Indian sequences analyzed or before the 
strains C.92BR025 (Brazil) and C-ErH2220 (Ethiopia) di- 
verged. 

Intersample diversity within subtype C has been previously 
found to vary from 5 to 11-5% (1, 7, 38). Higher levels of 
diversity were found among Botswana clones in this study, in 
spite of the fact that samples were taken from one place and at 
one time point Both full-genome sequence* and multiple sub- 
gen omic loci demonstrated the same patterns, with a higher 
mean value of variability among the Botswana samples 

The increased genetic diversity of subtype C viruses in Bots- 
wana ml^t have different underlying causes, ineluding a vari- 
ety of host and vira! factors. Among the latter factors, a oom- 
binaliun of the genetic flexibility of subtype C virus and its 
multiple introductions might be the most important. A number 
of recent findings argue that one possible cause of the high 
viral diversity in the Botswana epidemic could be higher flex- 
ibility of subtype C vims and its altered ability to divcrsify. 
Thesc arguments include, but are not limited to the following, 
(i) Subtype C is predominant in most recent HTV-l epidemics 
worldwide (1, 7, 27. 35, 36, 38, 45, 47). (ii) The highest prev- 
alence of HTV-l infeclion in various epidemics is caused main- 
ly by subtype C virus, (iii) Subtype C virus may have a faster 
disease progression (20), and paiienU infected with HIV-l sub- 
type C developed AIDS earlier than paiionU with subtype A 
vfaiK (23). (iv) Three or four NF-»<B sites (instead of two) might 
lead to more efficient viral iranscrqitlon {\\ 29, 30). (v) The 
TNF-a response to subtype C virus Is significantly higher then 
to HIV-l subtype B (28, 29), suggesting the possibility of in- 
creased viral transcription and re{Hicaiion in correlation with 
NF-kB copy number (28, 29). (vi) The vira! bad of subtype C 
infections may be higher in different compartments that might 
cause an increased level of viral uansmission (22). On the oth- 
er hand, a scenario that suggests independent divcrsiticalion of 
the virus in other regions and delayed entry of the epidemic in 
Botswana, followed by multiple inlroductions of the subtype C 
virus from adjacent countries caoDOt be excluded (4.2^). 

Botswana is geographically located at the center of the AIDS 
epidemic in Southern Africa. UNAlDS and World Healib Or- 
ganization sutveiUancc data suggest that the widespread rise of 
the HTV-l epidemic in Botswana started in the early to mid- 
1990$ and reached one of the highest prevalence rates In Africa 
(42-44). For more recent HIV-l epidemics;, such as thoae de- 
scribed in Thailand and India, one might expect to fiud a highly 
homogeneous ]X>ol of local viruses that formed a nionophyletic 
phylogenetic subclustcr with relatively short and aggregated 
branches. However the fin<Cngs in this study contradict tbe 
established trend. 

Extreniely high intcipAtjcnt dh^ty across the genome was 
supported by long branch lengths in the phylogenetic trees 
throughout the Botswana vinses within the genedc subtype 
C. No multiple subtypes or recombination were found in this 
study. Because it cunently has the highest incidence rate* of 



